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Y^ ', stituent filters running in parallel to model a desired signal. We use "Bregman di- 
vergences" and obtain certain multiplicative updates to train the linear combination 

^ , weights under an affine constraint or without any constraints. We use unnormalized 
00 

Q^ ' relative entropy and relative entropy to define two different Bregman divergences 
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Tlj" . that produce an unnormalized exponentiated gradient update and a normalized ex- 

S-J I ponentiated gradient update on the mixture weights, respectively. We then carry 

I . out the mean and the mean-square transient analysis of these adaptive algorithms 

when they are used to combine outputs of m constituent filters. We illustrate the 
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r> ' accuracy of our results and demonstrate the effectiveness of these updates for sparse 
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1 Introduction 

In this paper, we study adaptive mixture methods based on "Bregman diver- 
gences" [1112] that combine outputs of m constituent filters running in parallel 
on the same task. The overall system has two stages [HHH]. The first stage con- 
tains adaptive filters running in parallel to model a desired signal. The outputs 
of these adaptive filters are then linearly combined to produce the final output 
of the overall system in the second stage. We use Bregman divergences and 
obtain certain multiplicative updates [H], |2], [ID] to train these linear combi- 
nation weights under an affine constraint [llj or without any constraints [T^ . 
We use unnormalized [2] and normalized relative entropy [9] to define two 
different Bregman divergences that produce the unnormalized exponentiated 
gradient update (EGU) and the exponentiated gradient update (EG) on the 
mixture weights [9], respectively. We then perform the mean and the mean- 
square transient analysis of these adaptive mixtures when they are used to 
combine outputs of m constituent filters. We emphasize that to the best of 
our knowledge, this is the first mean and mean-square transient analysis of the 
EGU algorithm and the EG algorithm in the mixture framework (which nat- 
urally covers the classical framework also [l3l[Tl]). We illustrate the accuracy 
of our results through simulations in different configurations and demonstrate 
advantages of the introduced algorithms for sparse mixture systems. 

Adaptive mixture methods are utilized in a wide range of signal processing 
applications in order to improve the steady-state and/or convergence perfor- 
mance over the constituent filters pTlll2pi5] . An adaptive convexly constrained 
mixture of two filters is studied in [15], where the convex combination is shown 
to be "universal" such that the combination performs at least as well as its 
best constituent filter in the steady-state [15j. The transient analysis of this 
adaptive convex combination is studied in [16], where the time evolution of 
the mean and variance of the mixture weights is provided. In similar lines. 



an affinely constrained mixture of adaptive filters using a stochastic gradi- 
ent update is introduced in [11]. Tfie steady-state mean square error (MSE) 
of this affinely constrained mixture is shown to outperform the steady-state 
MSE of the best constituent filter in the mixture under certain conditions jTT] . 
The transient analysis of this affinely constrained mixture for m constituent 
filters is carried out in pTj. The general linear mixture framework as well as 
the steady-state performances of different mixture configurations are studied 
in [12]. 



In this paper, we use Bregman divergences to derive multiplicative updates 
on the mixture weights. We use the unnormalized relative entropy and the 
relative entropy as distance measures and obtain the EGU algorithm and the 
EG algorithm to update the combination weights under an affine constraint 
or without any constraints. We then carry out the mean and the mean-square 
transient analysis of these adaptive mixtures when they are used to combine 
m constituent filters. We point out that the EG algorithm is widely used in 
sequential learning theory [TB] and minimizes an approximate final estimation 
error while penalizing the distance between the new and the old filter weights. 
In network and acoustic echo cancellation applications, the EG algorithm is 
shown to converge faster than the LMS algorithm [21 (TH] when the system 
impulse response is sparse p^. Similarly, in our simulations, we observe that 
using the EG algorithm to train the mixture weights yields increased con- 
vergence speed compared to using the LMS algorithm to train the mixture 
weights [II1[I2] when the combination favors only a few of the constituent 
filters in the steady state, i.e., when the final steady-state combination vector 
is sparse. We also observe that the EGU algorithm and the LMS algorithm 
show similar performance when they are used to train the mixture weights 
even if the final steady-state mixture is sparse. 

To summarize, the main contributions of this paper are as follows: 



• We use Bregman divergences to derive multiplicative updates on affinely 
constrained and unconstrained mixture weights adaptively combining out- 
puts of m constituent filters. 

• We use the unnormalized relative entropy and the relative entropy to define 
two different Bregman divergences that produce the EGU algorithm and the 
EG algorithm to update the affinely constrained and unconstrained mixture 
weights. 

• We perform the mean and the mean-square transient analysis of the affinely 
constrained and unconstrained mixtures using the EGU algorithm and the 
EG algorithm. 

The organization of the paper is as follows. In Section II, we first describe 
the mixture framework. In Section III, we study the affinely constrained and 
unconstrained mixture methods updated with the EGU algorithm and the 
EG algorithm. In Section IV, we first perform the transient analysis of the 
affinely constrained mixtures and then continue with the transient analysis of 
the unconstrained mixtures. Finally, in Section V, we perform simulations to 
show the accuracy of our results and to compare performances of the differ- 
ent adaptive mixture methods. The paper concludes with certain remarks in 
Section VI. 



2 System Description 



2. 1 Notation 



In this paper, all vectors are column vectors and represented by boldface 
lowercase letters. Matrices are represented by boldface capital letters. For 
presentation purposes, we work only with real data. Given a vector to, ly*^*) 



denotes the ith individual entry of lu, w is the transpose of tu, \\w 
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Fig. 1. A linear mixture of outputs of m adaptive filters. 

X^ilw*^*^! is the /i norm; \\w\\ = ^/w'^w is the I2 norm. For a matrix W, 
tr(VF) is the trace. For a vector w, diag(tu) represents a diagonal matrix 
formed using the entries of w. For a matrix W , diag( W) represents a column 
vector that contains the diagonal entries of W . For two vectors Vi and V2, 
we define the concatenation [■i;i;D2] = [ff VI2Y . For a random variable f, v 
is the expected value. For a random vector v (or a random matrix V), v (or 
V) represents the expected value of each entry. Vectors (or matrices) 1 and 
0, with an abuse of notation, denote vectors (or matrices) of all ones or zeros, 
respectively, where the size of the vector (or the matrix) is understood from 
the context. 



2.2 System Description 



The framework that we study has two stages. In the first stage, we have m 
adaptive filters producing outputs ?/i(t), i = 1, . . . ,m, running in parallel to 



model a desired signal y{t) as seen in Fig. 1. The second stage is the mixture 
stage, where the outputs of the first stage filters are combined to improve 
the steady-state and/or the transient performance over the constituent filters. 
We linearly combine the outputs of the first stage filters to produce the final 
output as y{t) = w'^{t)x{t), where x(t) = [yi{t) , . . . , ym{t)]'^ and train the 
mixture weights using multiplicative updates (or exponentiated gradient up- 
dates) [2]. We point out that in order to satisfy the constraints and derive 
the multiplicative updates [9] , [20] , we use reparametrization of the mixture 
weights as w{t) = f{z(t)) and perform the update on z{t) as 

z{t + 1) = argmin < d{z, z(t)) + ^ l[y{t), f^{z)x(t) 

where /i is the learning rate of the update, d{-, ■) is an appropriate distance 
measure and /(■,■) is the instantaneous loss. We emphasize that in ([1]), the 
updated vector z is forced to be close to the present vector z(t) by d{z{t + 
1), z{t)), while trying to accurately model the current data by l(y{t), f'^{z)x{t) 
However, instead of directly minimizing ([T]), a linearized version of ([T]) 

z{t + 1) = argmin ld{z, z{t)) + I {y{t), f{z{t))x 



+ fiVzl{y{t),f{z)x{t) 



i^-z{t))\ (2) 

Z=Z{t) J 



is minimized to get the desired update. As an example, if we use the /2-norm 
as the distance measure, i.e., d{z,z{t)) = \\z — z{t)\\'^, and the square error 
as the instantaneous loss, i.e., l(y{t), f^{z)x{t)] = [y{t) — f^{z)x{t)]'^ with 
/(z) = z, then we get the stochastic gradient update on w{t), i.e., 

w{t + 1) = w{t) + ne{t)x{t), 

in®. 

In the next section, we use the unnormalized relative entropy 

yii) 



d,{z,z{t)) = \Y: 



i=l 



Z 



'■'M^r '"«-'•' 



(3) 



for positively constrained z and z{t\ z G R!^, zit) G R![*, and the relative 
entropy 



d,{z,z{t)) = \Y. 



i=\ 



y{i) In 



y(i) 



rW(t)^ 



(4) 



where z is constrained to be in an extended simplex such that z^^' > 0, 
SfcLi -2*^*^ = u for some w > 1 as the distance measures, with appropriately 
selected /(■) to derive updates on mixture weights under different constraints. 
We first investigate affinely constrained mixture of m adaptive filters, and then 
continue with the unconstrained mixture using ([3]) and (j4j) as the distance 
measures. 



3 Adaptive Mixture Algorithms 

In this section, we investigate affinely constrained and unconstrained mixtures 
updated with the EGU algorithm and the EG algorithm. 



3.1 Affinely Constrained Mixture 



When an affine constraint is imposed on the mixture such that w^{t)l = 1, 
we get 

m = w{tfx{t), 

e{t) = y{t)-y{t), 



w 



(i) 



(t) = A(*)(t), z = l,...,m-l, 



m— 1 
i=l 

where the m — 1 dimensional vector A(t) = [X^^\t), . . . , \^'^~^\t)]'^ is the un- 
constrained weight vector, i.e., A(t) G R™^^. Using A(t) as the unconstrained 
weight vector, the error can be written as e(t) = y{t) — yrn{t) — X^{t)6{t), 
where d{t) = [yi{t) - ym{t) , ■ ■ ■ , yni^i{t) - ym{t)f^- To be able to derive a 



multiplicative update on A(t), we use 



A(t) = Ai(t)-A2(t), 



where Ai(t) and A2(t) are constrained to be nonnegative, i.e., Aj(t) G R' 



m— 1 

+ ' 



i = 1,2. After we collect unconstrained weights in Aa(t) = [Ai(t); A2(t)], we 
define a function of loss e(t) as 



k{Xa{t)) = e'{t) 



and update positively constrained Xa{t) as follows. 



3.1.1 Unnormalized Relative Entropy 

Using the unconstrained relative entropy as the distance measure, we get 



C 2(m-l) 

Aa(t + 1) =argmin<j ^ 



j=i 



A« In 



Ai^)(t). 



A«(t)-A» 



+ 



/^ 



/a (A.(t)) + V;,/a (A)^ |^^^^^^^ (A - A„(t)) 



After some algebra this yields 

A»(t + 1) = AW(t) exp{^e(t)(y,(t) - y™(t))} ,^ = 1, ... ,m - 1, 
Ai'H^ + 1) = A«(i) exp {-^e{t){Ut) - yrn{t))} , ^ = m, . . . , 2(m - 1), 

providing the multiplicative updates on Ai(t) and A2(t). 

3.1.2 Relative Entropy 

Using the relative entropy as the distance measure, we get 



C 2(m-l) 

Aa(t+ 1) =argmin<j ^ 



1=1 



A^*) In 



AW ^ 
\^\t), 



+ 7(m - l^A) 



+ 



^ 



^a (A.(t)) + V;,/a (A)^ |^^^ (A - A,(t)) 



where 7 is the Lagrange multipher. This yields 

,(0^. , i\ ^aUt) cxp {/ie(t){yi{t) - gm(t))} 

K (t + 1) = u 



i = 1, ... ,m — 1, 

^,{i)(.^^\ Ai''(t)exp{-^e(t)(s/^(f) ~gmft))} 

Aa (t + 1) = « 



Em — 1 
fc=l 

i = m, . . . , 2{m — 1), 



Ai'='{t)cxp{/.e{t)(yfc(t)-ym(t))} + Ai'=+'" ''(t) cxp {-^e{t)(yfc(t) - ym(t))} 



providing the multiphcative updates on Xa{t). 
3.2 Unconstrained Mixture 

Without any constraints on the combination weights, the mixture stage can 
be written as 

y(t) = ^^(t)a;(t), 
e{t) = y{t)-y{t). 

where w{t) G R™". To be able to derive a multiplicative update, we use a 
change of variables, 

w{t) = Wi{t) -W2{t), 
where Wi{t) and W2{t) are constrained to be nonnegative, i.e., Wi{t) G R™, 
i = 1,2. We then collect the unconstrained weights Wa{t) = [wi{t); W2(t)] and 
define a function of the loss e(t) as 

lu{w,{t))^e\t). 

3.2.1 Unnormalized Relative Entropy 

Defining cost function similar to (4) and minimizing it with respect to w yields 

w^:\t + 1) = wj^^it) exp {^e{tMt)} , z = 1, . . . , m, 

w^:\t + 1) = wj^\t) exp {-^e(t)y,(t)} ,t = m + l,...,2m, 



providing the multiplicative update on Wa{t). 



3.2.2 Relative Entropy 



Using the relative entropy under the simplex constraint on w., we get the 
updates 



"^a l*- "T -"-J ~ " m r 



E 

fc=i 



w^^\t) exp {^^e{t)y,{t)} + w^^+^^{t) exp {- ^e{t)y,{t)] 



i = 1, . . . ,m, 

^^it + l)=u- w^\t)e.^{-^,emm 



wi'\t) exp {fieit)Ut)} + wi'^"^\t) exp {-^e(t)y,(t)} 



E 

fc=l L 

i = m + 1 . . . , 2m.. 



In the next section, we study the transient analysis of these four adaptive 
mixture algorithms. 



4 Transient Analysis 

In this section, we study the mean and the mean-square transient analysis of 
the adaptive mixture methods. We start with the affinely constrained combi- 
nation. 

4.1 Affinely Constrained Mixture 

We first perform the transient analysis of the mixture weights updated with 
the EGU algorithm. Then, we continue with the transient analysis of the 
mixture weights updated with the EG algorithm. 

10 



4-1.1 Unconstrained Relative Entropy 

For the affinely constrained mixture updated with the EGU algorithm, we 
have the multiphcative update as 

^AfWE^"'"""'"!.,""'"""^', (5) 

A?(t + 1) = X^^\t) exp {-^e{t){Ut) - yUt))} , 



(,) - {-fieit){Ut)-yrn{t))) 



k 



fc=o '^■ 

for i = 1, . . . , m — 1. If e(t) and yi{t) — ym{t) for each i = 1, . . . ,m — 1 are 
bounded, then we can write ([5]) and ([6]) as 

A«(t + l) = A«(t)(l + H0(y,(t)-y„(t)) + O(/i^)), (7) 

A«(t + 1) = A«(t)(l - ^e(t)(y,(t) - y^t)) + 0(/i2)), (8) 

for z = 1, . . . , m — 1. Since /x is usually relatively small p], we approximate ([7]) 
and ([H]) as 

A«(t + 1) = A«(t)(l + f^emUt) - yUt))), (9) 

A?(t + 1) = A«(t)(l -^e(t)(y.(t) -y„(t))). (10) 

In our simulations, we illustrate the accuracy of the approximations (Q and 
fITOl) under the mixture framework. Using (Q and flTOj) . we can obtain updates 
on Ai(t) and A2(t) as 

Ai(t + !) = (/ + /ie(t)diag(5(t))) Ai(t), (11) 

A2(t + 1) = (/ - /xe(t)diag(5(t))) A2(t). (12) 

Collecting the weights in Aa(t) = [Ai(t); A2(t)], using the updates (fTTI) and 
(fT2l) . we can write update on \a{t) as 

Xa{t + 1) = (j + /ie(t)diag(w(t))) Aa(t), (13) 



11 



where u{t) is defined as u{t) = [<5(t); — (5(t)]. 



For the desired signal y{t), we can write y{t)—ym{t) = ■^o'(^)^(^) + ^o(^)) where 
Ao(t) is the optimum MSE solution at time t such that Ao(t) = R^^{t)p(t), 
R{t) = E[S{t)S'^{t)'], pit) = E {S{t)[y{t) ~ yrnit)]] and eo(t) is zero-mean 
and uncorrelated with S{t). We next show that the mixture weights con- 
verge to the optimum solution in the steady-state such that limt^^oo E \{t) = 
limi_^oo Ao(t) for properly selected fi. 

Subtracting (IT2l) from (fTTI) . we obtain 

X{t + 1) = A(t) + ^e(f)diag(5(t)) (Ai(t) + A2(t)) , 

= A(t) - ^e(t)diag((5(t))A(t) + 2/ie(t)diag((5(t)) Ai(t). (14) 

Defining e{t) = Ao(t) - A(t) and using e(t) = S'^{t)e{t) + eo{t) in (^ yield 

X{t + 1) = A(t) - /idiag(5(t)) A(t)6^(t)£(t) - /idiag(<5(t)) A(t)eo(t) 

+ 2/xdiag((5(t))Ai(t)(5^(t)£(t) + 2^diag(5(t))Ai(t)eo(t). (15) 

In (IT5|) . subtracting both sides from Ao(t + 1), we have 

e{t + 1) = e{t) + i2dia.g[S{t))X{t)6^{t)e{t) + /idiag(5(t))A(t)eo(t) 
- 2^diag(5(t))Ai(t)(5^(t)£(t) - 2/idiag((5(t))Ai(t)eo(t) 



+ 



Ao(t + l)-Ao(t) 



(16) 



Taking expectation of both sides of fITBl) and using 



E 
E 



Aidiag(5(t))A(t)eo(t) 
2/xdiag((5(t))Ai(t)eo(t) 



E 



/idiagU(t))A(t) E[eo{t)] = 



E 



2/idiag((5(t))Ai(t)]£;[eo(t)] = 0, 



and assuming that Ai(t) and A2(t) are independent of e{t) [T7] yield 



E 



e{t +1)]= e[I- /idiag(Ai(t) + X2{t))6{t)S^{t)]E[e{t) 



+ E 



Ao(t + l)-Ao(t) 



(17) 
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Assuming convergence of R{t) and p{t) (which is true for a wide range of 
adaptive methods in the first stage [16], [111121]), we obtain hmt^oo E Ao(t + 
1) — Ao(t) = 0. If fi is chosen such that the eigenvalues of E J — /xdiagf Ai(t) + 
A2(t))(5(t)(5 (t) have strictly less than unit magnitude for every t, then 
limj_^oo E X{t) = limj^oo Ao(t). 



For the transient analysis of the MSE, we have 



E[e'it)] = E I [y{t) - y^{t)\ | - 2Xl{t)E { [y{t) - y„,{t)\ [5{t)- -5{t)] } 

+ E {\i{tmt)- -smm-. -5{t)YK(t)] , 

= E\[y{t) - y„(t)]'} - 2\l{t)E[[y{t) - y^{t)\u{t)] 
+ tr (e [K{t)\l{t)\ E {u{t)u{tf] \ , 

E [ \y(t) - y^it)]'] - 2Xlit)j{t) + tr fi? \Xa{t)Xl{t)] T{t)] 



where we define j{t) = E lu{t) y{t) — yjn{t) | and T{t) = E u{t)u'^{t) 



For the recursion of \a{t) = E[\a{t)], using (fT3|) . we get 



Xa{t + 1) = \a{t) + Mag(7(t)) Kit) - fidmg[E[Xa{t)Xl{t)]r{t)) . (19) 



Using (1321) . assuming Xa{t) is Gaussian and assuming Xa\t) and Xa\t) are 



13 



independent when i ^ j [T7], [H], we get a recursion for E Xa{t)X^{t) 



as 



E[Xa{t+l)X!^{t + l)] = E[Xa{t)X^{t)] + fidiag{-f{t))E[Xa{t)Xl{t) 
-/xdiag(r(t)A„(t))i?[A,(t)Ar(t) 

- /i^[diag^(n(t))] ('i?[A„(t)Ar(t)] - A.(t)Ar(t)Wr(t) 



Kmlit) 



diag(r(t)Aa(t) 



-/idiag(A„(t))r(t)fE[A„(t)Ar(t) 

+ Ai^[A„(t)Ar(t)]diag(7(t)) - fiE[Xait)X'^^it) 

- f,Xa{t)l^(E[Xa{t)X^{t)] - K{t)Xl{t)\E[die.g\u{t)) 

-JE[x^{t)Xl{t)]-Xa{t)Xl{t)]r{t)die.g{X^{t)). 



(20) 



Defining q^{t) = Xa{t) and Q^{t) = E[Xa{t)Xl{t)], we express (^ and (EQl) 
as a coupled recursions in Table [H 

Table 1 

Time evolution of the mean and the variance of the affinely constrained mixture 

weights updated with the EGU algorithm 

Qait + 1) = Qait) + /^diag(7(t))q„(t) - Mdiag(Q„(t)r(t)) , 

a(t + l) = (/ + Mdiag(7(t)) -/.diag(r(t)q„{t)))Q„(t) -/.S[diag2(^(t))] (^Q^{t) - qMQl(t))uI{t) 

lMdmg{q^{t))r(t)(^Q^(t) - qMlIit)) + QJt)(^^,di^g{fit)) - ^diag(r(t)<?,{t))) 

Mq„ {t)l^ (q„ (t) - <7„ (t)9r(t)) B [diag2 (u(t))] - ;. (q Jt) - q„ (t)qT (j)^ r(t)diag (q, (t)) . 



Q. 



In Table [H we provide the mean and the variance recursions for Q^ (t) and 
g„(t). To implement these recursions, one needs to only provide r(t) and "fit). 
Note that r(t) and 7(t) are derived for a wide range of adaptive filters [16j, 
[Ti] . If we use the mean and the variance recursions in (fT8|) . then we obtain 
the time evolution of the final MSE. This completes the transient analysis of 
the affinely constrained mixture weights updated with the EGU algorithm. 



14 



4.1.2 Relative Entropy 

For the affinely constrained combination updated with the EG algorithm, we 
have the multiphcative updates as 



X['\t + l)=u- 



A^'>(t)cxp{Ate(t)(y,(t) - y„(t))} 



Er=i' ^[''\t)e^p{l^e{t){yk{t)~y^{t))}+\i''\t)e^p{-p.e{t){yk{t)-ym{t))} 



A*''(t + 1) =u- 



\'i\t)e^p{-^le{t)iy,(t)-y,n{t))} 



Er=r yi\t)e^p{l^e{t){yk{t)-~y^{t))}+\i''\t)e^p{-p.e{mk{t)-ym{t))} 



(fe)/ 



for i = 1, . . . ,m — 1. Using the same approximations as in ([7]) , ([8]) , iQ and 
( ITOj) . we obtain 



X['\t + l) = u 



A^*' (t + 1) = n 



Ai'){t)(l + Me(t){y.{t)-y™{t))) 



Er="i' [^i'^(*)(l + t^<t)iyk{t) - y^{t))) + A('=>(t)(l - Me(t)(yfe(t) - ^™{t))) 
AW(t)(l-/.e{t)(j;,(t)-y„(t))) 



Er=l ^1 ni)(l + /^eWtefcW - y™(t))) + A^'='(t)(l - Me(t)(yfe(t) - y™(t)) 



(21) 



(22) 



In our simulations, we illustrate the accuracy of the approximations (121 p and 
f l22|) under the mixture framework. Using (|2T]) and fl22|) . we obtain updates on 
Ai(t) and A2(t) as 



Ai(t + 1) =ii 



A2(t+1) =u 



(/ + /ie(t)diag(5(t)))Ai(t) 



l^ + ^e(t)n^(t)jA,(t) 
(/-/xe(t)diag(5(t)))A2(t) 



1^ + /ie(t)n^(t) A,(t) 



(23) 
(24) 



Using updates (|23|) and ( l2l|) . we can write update on Aa(t) 



Xait+1) =U 



'l + fie{t)diag{u{t))]\a{t) 


l^ + ^e(t)ii^(t)]A,(t) 



(25) 
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For the recursion of Aa(t), using ( 125|) . we get 

/ + /xe(t)diag(u(t))]Aa(t) 



E 



Xait+l] 



Elu- 



l'^ + ^e{t)uT{t)\K{t) 
E [\l + lie{t)dmg(u{t) 



E 



u- 



i?{[l^ + /ie(t)n^(t)]A,(t)} ' 
A,(t)] +/idiag(7(t))^[A„(t)] -/idiag(E[A,(t)Ar(t)]r 



(26) 



l^ + /i7^(t) EK{t) - ME[K{t)K{t)]T{t) 



(27) 



where in (12^ we approximate expectation of the quotient with the quotient 
of the expectations. In our simulations, we also illustrate the accuracy of this 
approximation in the mixture framework. From (125|) . using the same approx- 
imation in (127|) . assuming \a{t) is Gaussian, assuming A^*^(t) and \l^\t) are 



independent when i 7^ j, we get a recursion for E Aa(t)A„ (t) 



as 



£; 



A,(t + l)Ar(t+l) 



6(t)^ 



(2J 



where A{t) is equal to the right hand side of (l20l) and 



6(t) = l^^[A,(t)Ar(t)]l + /ip^(t)£;[A,(t)Ar(t)]l 
-/zAr(t)i2(f)E[A„(t)Ar(t)]l-^l^('E[A,(t)Ar(t)]-A.(t)Ar(t)^i2(t)A.(t) 

- /il^(^^[A,(t)Ar(t)] - A,(t)Ar(t) jE[diag'(n(t))]l^A„(t)l 
+ /il^E[A,(t)Ar(t)]p(t)-/il^^[A,(t)Ar(t)]i?(t)A,(t) 

- fiXi{t)R{t) (E[Xa{t)x:{t)] - Kmiit)^ 1 

-/il^Ar(t)li?[diag2(u(t))]('i?[A,(t)Ar(t)]-A,(t)Ar(t)V (29) 

If we use the mean ( 1271) and the variance ( l28l) . ( l29l) recursions in ( ITSl) . then 
we obtain the time evolution of the final MSE. This completes the transient 
analysis of the affinely constrained mixture weights updated with the EG 
algorithm. 
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4-2 Unconstrained Mixture 

We use the unconstrained relative entropy and the relative entropy as distance 
measures to update unconstrained mixture weights. We first perform transient 
analysis of the mixture weights updated using the EGU algorithm. Then, we 
continue with the transient analysis of the mixture weights updated using the 
EG algorithm. Note that since the unconstrained case is close to the afiinely 
constrained case, we only provide the necessary modifications to get the mean 
and the variance recursions for the transient analysis. 

4.2.1 Unconstrained Relative Entropy 

For the unconstrained combination updated with EGU, we have the multi- 
plicative updates as 

wf\t + 1) = w?{t) exp {fie{t)yi{t)} , 
w'i\t + 1) = wfit) exp {-iie{t)Ut)} , 

for i = 1, . . . ,m. Using the same approximations as in ([7]), ([8]), ([9]) and (fTOj) . 
we can obtain updates on voiit) and W2{t) as 

wiit + !) = (/ + ^ie{t)dmg{x{t)))wi{t), (30) 

W2{t +!) = (/- ^e{t)di&g(x{t)))w2{t). (31) 

Collecting the weights in Wait) = [wi{t);w2{t)], using the updates (l30l) and 
fl^ . we can write update on Wa{t) as 

Wait + !) = (/ + fieit)diag(uit)))wait), (32) 

where w(t) is defined as n(t) = [xit); —xit)]. 

For the desired signal y(t), we can write ?/(t) = K;^(t)cc(t) + eo(t), where 
Woit) is the optimum MSE solution at time t such that lUo(^) = R^^ii)pii), 
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R(t) = E x{t)x^{t) , p(t) = E {x{t)y(t)} and eo(t) is zero-mean disturbance 
uncorrelated to x{t). To show that the mixture weights converge to the opti- 



w{t) 



hm 



t— >oo 



Wo{t), 



mum solution in the steady-state such that hmi_!.oo E 
we follow similar lines as in the Section 4.1.1. We modify flT^ . flT^ . fITB]) and 
(TT71) such that A will be replaced by w, d{t) will be replaced by x{t) and 
£{t) = wo{t) — w(t). After these replacements, we obtain 



E 



e{t + l] 



E 



I - fidiag{wi{t) + W2it))x{t)x' (t) E e{t) 



+ E 



Wo{t+ 1) -Wo{t) 



(33) 



Since, we have limj_j.oo E WQ{t + 1) — WQ{t) = for most adaptive filters in 
the first stage [13] and if fi is chosen so that all the eigenvalues oi E I — 
/idiag(it;i(t) + W2{t)jx{t)x'^ (t) have strictly less than unit magnitude for 



every t, then limj_j.oo E w{t) 



lim 



t—>-oo 



Wo{t). 



For the transient analysis of MSE, defining 7(t) = E {u{t)y{t)} and T{t) = 



E 



u(t)u^(t) 



T8ll is modified as 



E[e\t)] = E{y\t)}-2w^^{t)j{t)+tTlE [w^{t)w^^{t)] T{t)]. (34) 



Accordingly, we modify the mean recursion flT9|) and the variance recursion 
f l20|) such that instead of Xa{t) we use Wa{t). We also modify the Table [1] 
using g„(t) = 'u;a(t) and Qa(t) = E Wa{t)w^{t) . If we use this modified 
mean and variance recursions in (1341) . then we obtain the time evolution of 
the final MSE. This completes the transient analysis of the unconstrained 
mixture weights updated with the EGU algorithm. 



4-2.2 Relative Entropy 

For the unconstrained combination updated with the EG algorithm, we have 
the multiphcative updates as 



, ,»r/ I 1^ _, wi^)(t)exp{^e(t)y^(t)} 



E 



wi'\t) exp {fie{t)Ut)} + wi'+"^\t) exp {-^e(t)y,(t)} 



fc=l L 

i = 1, . . . ,m, 



w^\t + l) = u- w^:\t)exp{-fie{t)Ut)} 



E 
fc=i 

i = m + 1 . . . , 2m. 



wi'\t) exp {/ie(t)y,(t)} + wi'^"^\t) exp {-;,e(t)y,(t)} 



Following similar lines, we modify (ES]), (El, (ES]), (ET]), (EHD and ([29]) such 
that we replace 6{t) with x{t), X with i« and u{t) = x{t)] —x{t) . Finally, 
we use the modified mean and variance recursions in ( 1M|) and obtain the 
time evolution of the final MSE. This completes the transient analysis of the 
unconstrained mixture weights updated with the EG algorithm. 



5 Simulations 

In this section, we illustrate the accuracy of our results and compare per- 
formances of different adaptive mixture methods through simulations. In our 
simulations, we observe that using the EG algorithm to train the mixture 
weights yields better performance compared to using the LMS algorithm or 
the EGU algorithm to train the mixture weights for combinations having more 
than two filters and when the combination favors only a few of the constituent 
filters. The LMS algorithm and the EGU algorithm perform similarly in our 
simulations when they are used to train the mixture weights. We also observe 
in our simulations that the mixture weights under the EG update converge to 
the optimum combination vector faster than the mixture weights under the 
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LMS algorithm. 

To compare performances of the EG and LMS algorithms and illustrate the 
accuracy of our results in (1271) . fl28|) and fl29l) under different algorithmic pa- 
rameters, the desired signal as well as the system parameters are selected as 
follows. First, a seventh-order linear filter. 

Wo = [0.25,-0.47,-0.37,0.045,-0.18,0.78,0.147]^, is chosen as in [I?]. The 
underlying signal is generated using the data model y(t) = r w'^a{t) + n(t), 
where a{t) is an i.i.d. Gaussian vector process with zero mean and unit 
variance entries, i.e., E[a{t)aF {t)] = I, n{t) is an i.i.d. Gaussian noise pro- 
cess with zero mean and variance E[n'^{t)] = 0.3, and r is a positive scalar 
to control SNR. Hence, the SNR of the desired signal is given by SNR = 
10 log( q;^ ( )) i ^ = 101og( ^ QQ-|° ). For the first experiment, we have 

SNR = -lOdB. To model the unknown system we use ten linear filters us- 
ing the LMS update as the constituent filters. The learning rates of these two 
constituent filters are set to /ii = 0.002 and /ig = 0.002 while the learning 
rates for the rest of the constituent filters are selected randomly in [0.1, 0.11]. 
Therefore, in the steady-state, we obtain the optimum combination vector 
approximately as Aq = [0.5, 0, 0, 0, 0, 0.5, 0, 0, 0, 0]^, i.e., the final combination 
vector is sparse. In the second stage, we train the combination weights with 
the EG and LMS algorithms and compare performances of these algorithms. 
For the second stage, the learning rates for the EG and LMS algorithms are 
selected as //eg = 0.0008 and /Ulms = 0.005 such that the MSEs of both 
mixtures converge to the same final MSE to provide a fair comparison. We 
select u = 500 for the EG algorithm. In Fig. [2^, we plot the weight of the 
first constituent filter with /xi = 0.002, i.e. E[X^^'{t)], updated with the EG 
and LMS algorithms. In Fig. [2)d, we plot the MSE curves for the adaptive 
mixture updated with the EG algorithm, the adaptive mixture updated with 
the LMS algorithm, the first constituent filter with /ii = 0.002 and the sec- 
ond constituent filter with /X2 ^ [0.1,0.11]. From Fig. [2^ and Fig. [2}d, we see 
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that the EG algorithm performs better than the LMS algorithm such that the 
combination weight under the update of the EG algorithm converges to 0.5 
faster than the combination weight under the update of the LMS algorithm. 
Furthermore the MSE of the adaptive mixture updated with the EG algorithm 
converges faster than the MSE of the adaptive mixture updated with the LMS 
algorithm. In Fig. ^, to test the accuracy of fl27|) . we plot the theoretical val- 
ues for A^^^ (t) and A^^^-* (t) along with simulations. Note in Fig. ^ we observe 
that \^^\t) = X'^aKi) — ^'li^K't) converges to 0.5 as predicted in our derivations. 
In Fig. [2]i, to test the accuracy of (!28l) and (!29l) . as an example, we plot the 



theoretical values of E 



AW(t)2l and £;[AW(t)A^3)(t)l along with simulations. 



As we observe from Fig. ^ and Fig. [2]i, there is a close agreement between 
our results and simulations in these experiments. We observe similar results 
for the other cross terms. 

We next simulate the unconstrained mixtures updated with the EGU and EG 
algorithms. Here, we have two linear filters and both using the LMS update to 
train their weight vectors as the constituent filters. The learning rates for two 
constituent filters are set to /ii = 0.002 and /i2 = 0.1 respectively. Therefore, in 
the steady-state, we obtain the optimum vector approximately as Wo = [1,0]. 
We have SNR = 1 for these simulations. The unconstrained mixture weights 
are first updated with the EGU algorithm. For the second stage, the learning 
rate for the EGU algorithm is selected as //egu = 0.01. The theoretical curves 
in the figures are produced using T{t) and 7(t) that are calculated from the 
simulations, since our goal is to illustrate the validity of derived equations. In 
Fig. [3^, we plot the theoretical values of w^^\t), w''^\t), w^a\'t) and w^a\'t) 
along with simulations. In Fig. [3)d, as an example, we plot the theoretical val- 



ues of E 



,(i)m7i,(2)f 



(2)m,„(3)f 



w^^Ktr , E w^^Kt)^^^:Kt) , E w^^^it)wi^\t) and E wi^\t)wl^Kt) 



,(3)m,„(4)/ 



along with simulations. We continue to update the mixture weights with the 
EG algorithm. For the second stage, the learning rate for the EG algorithm 
is selected as /ieg = 0.01. We select w = 3 for the EG algorithm. In Fig. [3t, 
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we plot the theoretical values of w^^\t), w^^^t), w'^^t) and w^^\t) along 
with simulations. In Fig. |3]i, as an example, we plot the theoretical values of 



E 



w^^Ktf 



E 



and E 



M) 



w)^\t)w^^i{t) along 



w^}){t)w^i){t)\,E[w^^){t)w'i){t)^ 
with simulations. We observe a close agreement between our results and sim- 
ulations. 

To test the accurateness of the assumptions in ([9]) and (TTOl) . we plot in Fig. 
UK, the difference 

II exp {iJie{t){Ut) - ym{t))] - {1 + ^^e{t){Ut) - y^{t)))] f 

^11 exp {/ie(t)(y,(t) - y^{t))} P|| {1 + /ie(t)(y,(t) - y,n(t)))} P 

for i = 1 with the same algorithmic parameters as in Fig. [2] and Fig. |3l To 

test the accurateness of the separation assumption in (|27|) . we plot in Fig. |Dd, 

the first parameter of the difference 



Elu 



\j+He(t)Aia.g[U 



[V+fie(t)uT{t)\X^{t) 



(t)) ]Xa{i)\ ^{ [^+Me(t)diag {U{t)) ] \a it) } 

E([l'^+fie{t)uT{t)]Xa{t)\ 



\ 



Elu 



[j+/je(t)diag(tt(t))]Aa(t) 

[l^+pLe{t)UT{t)\\a{t) 



U 



eI [I+^le{t)dla.g{u{t))'\\a(t)\ 
e[[V +tJie{t)UT(t)\\a{t) 



with the same algorithmic parameters as in Fig. H] and Fig. [3J We observe 
that assumptions are fairly accurate for these algorithms in our simulations. 

In the last simulations, we compare performances of the EGU, EG and LMS 
algorithms updating the afiinely mixture weights under different algorithmic 
parameters. Algorithmic parameters and constituent filters are selected as in 
Fig. blunder SNR = -5 and 5. For the second stage, under SNR = -5, learning 
rates for the EG, EGU and LMS algorithms are selected as /xeg = 0.0005, 
A^EGU = 0.005 and /ilms = 0.005 such that the MSEs converge to the same final 
MSE to provide a fair comparison. We choose u = 500 for the EG algorithm. 
In Fig. [5^1, we plot the MSE curves for the adaptive mixture updated with 
the EG algorithm, the adaptive mixture updated with the EGU algorithm, 
the adaptive mixture updated with the LMS algorithm, first constituent filter 
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with 111 = 0.002 and second constituent filter with /X2 ^ [0.1,0.11] under SNR 
= -5. Under SNR = 5, learning rates for the EG, EGU and LMS algorithms 
are selected as /ieg = 0.002, /xegu = 0.005 and /iLMS = 0.005. We choose u 
= 100 for the EG algorithm. In Fig. [Sjo, we plot same MSE curves as in Fig. 
[5^. We observe that the EG algorithm performs better than the EGU and 
LMS algorithms such that MSE of the adaptive mixture updated with the 
EG algorithm converges faster than the MSE of adaptive mixtures updated 
with the EGU and LMS algorithms. We also observe that the EGU and LMS 
algorithms show similar performances when they are used to train the mixture 
weights. 



6 Conclusion 

In this paper, we investigate adaptive mixture methods based on Bregman 
divergences combining outputs of m adaptive filters to model a desired signal. 
We use the unnormalized relative entropy and relative entropy as distance 
measures that produce the exponentiated gradient update with unnormalized 
weights (EGU) and the exponentiated gradient update with positive and neg- 
ative weights (EG) to train the mixture weights under the affine constraints or 
without any constraints. We provide the transient analysis of these methods 
updated with the EGU and EG algorithms. In our simulations, we compare 
performances of the EG, EGU and LMS algorithms and observe that the EG 
algorithm performs better than the EGU and LMS algorithms when the com- 
bination vector in steady-state is sparse. We observe that the EGU and LMS 
algorithms show similar performance when they are used to train the mixture 
weights. We also observe a close agreement between the simulations and our 
theoretical results. 
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Fig. 2. Using 10 LMS filters as constituent filters, where learning rates for 2 con- 
stituent filters are n = 0.002 and for the rest are fi G [0.1,0.11]. SNR = -lOdB. 
For the mixture stage, the EG algorithm has /iEG = 0.0008 and the LMS algo- 
rithm has /iLMS = 0.005. For the EG algorithm, u = 500. (a) The weight of the 
first constituent filter in the mixture, i.e., E[X^ (t)]. (b) The MSE curves for adap- 
tive mixture updated with the EG algorithm, the adaptive mixture updated with 
the LMS algorithm, the first constituent filter and the second constituent filter. 
(c) Theoretical values Xa (t) and Xa (t) and simulations, (d) Theoretical values 



^(l)^.^21 



,(i)m\(3) 



E[xi'{tf] and E[Xi'{t)Xa'it)] and simulations. 
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Fig. 3. Two LMS filters as constituent filters with learning rates ii\ = 0.002 and 
fi2 = 0.1, respectively. SNR = IdB. For the second stage, the EGU algorithm 
has /iEGU = 0-01 and the EG algorithm has /xeg = 0.01. For the EG algorithm, 
u = 3. (a) Theoretical values for the mixture weights updated with the EGU 
algorithm and simulations, (b) Theoretical values E^Wa (t)"^], E^Wa {t)wa {t)~\, 
E^Wa {t)wa {t)~\ and E^Wa {t)wa {t)~\ and simulations, (c) Theoretical mixture 
weights updated with the EG algorithm and simulations, (d) Theoretical values 
E[w^^\t)^], E[w^^\t)w^^\t)], E[wl^\t)wl^\t)] and E[w^^\t)wi^\t)] and simu- 
lations. 
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MSEs of the constituent filters and adaptive mixtures, SNR = -5dB 




■ MSE of the first constituent filter 
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MSEs of the constituent filters and adaptive mixtures, SNR = 5dB 
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Fig. 5. Algorithmic parameters and constituent filters are selected as in Fig. [2] under 
SNR = -5dB. For the second stage, the EG algorithm has /xeg = 0.0005, the EGU 
algorithm has fJ-EGU = 0.005 and the LMS algorithm has ^lms = 0.005. For the EG 
algorithm, u = 500. (a) the MSE curves for the adaptive mixture updated with the 
EG algorithm, the adaptive mixture updated with the EGU algorithm, the adaptive 
mixture updated with the LMS algorithm, the first constituent filter and the second 
constituent filter. Next, SNR = 5dB. For the second stage, the EG algorithm has 
AtEG = 0.002, the EGU algorithm has /^egu = 0.005 and the LMS algorithm has 
/^LMS = 0.005. For the EG algorithm, u = 100. (b) the MSE curves for the adaptive 
mixture updated with the EG algorithm, the adaptive mixture updated with the 
EGU algorithm, the adaptive mixture updated with the LMS algorithm, the first 
constituent filter and the second constituent filter. 
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